Mapping the Object-Role Modeling language 
ORM2 into Description Logic language T>ClZifd* 



C. Maria Keet 

Faculty of Computer Science, Free University of Bozen-Bolzano, Italy 
keetOinf . unibz . it 



Abstract. In recent years, several efforts have been made to enhance 
conceptual data modelling with automated reasoning to improve the 
model's quality and derive implicit information. One approach to achieve 
this in implementations, is to constrain the language. Advances in De- 
scription Logics can help choosing the right language to have greatest 
expressiveness yet to remain within the decidable fragment of first order 
logic to realise a workable implementation with good performance using 
DL reasoners. The best fit DL language appears to be the ExpTime- 
complete DCR.^. To illustrate trade-offs and highlight features of the 
modelling languages, we present a precise transformation of the map- 
pable features of the very expressive (undecidable) ORM/ORM2 con- 
ceptual data modelling languages to exactly DjCTZjf^. Although not all 
ORM2 features can be mapped, this is an interesting fragment because it 
has been shown that DClZjftj can also encode UML Class Diagrams and 
EER, and therefore can foster interoperation between conceptual data 
models and research into ontological aspects of the modelling languages. 



1 Introduction 

Various strategies and technologies are being developed to reason over concep- 
tual data models to meet the same or slightly different requirements and aims. 
An important first distinction is between the assumption that modellers should 
be allowed to keep total freedom to model what they deem necessary to repre- 
sent and subsequently put constraints on which parts can be used for reasoning 
or accept slow performance versus the assumption that it is better to constrain 
the language a priori to a subset of first order logic so as to achieve better per- 
formance and a guarantee that the reasoner terminates. The former approach 
is taken by pQ using a dependency graph of the constraints in a UML Class 

* This is an updated version of Keet, CM. Mapping the Object-Role Modeling 
language ORM2 into Description Logic language T>CR.jf C j. KRDB Research Cen- 
tre Technical Report KRDB07-2, Free University of Bozen-Bolzano, 15-2-2007. 
http://arxiv.org/abs/cs.LO/0702089vl it discusses additional recent literature, has 
the ORM figures made with the new tool NORMA, makes the mappable elements 
more readable (including fixing some typos in the mappings and adding more expla- 
nations), and has the proofs of correctness of encoding. 



Diagram + OCL and by first order logic (FOL) theorem provers. The latter ap- 
proach is taken by [2 3 4 5 6 7 8 9 10 who experiment with different techniques. 
For instance, |2l3j use special purpose reasoners for ORM and UML Class Dia- 
grams, |4l5j encode a subset of UML class diagrams as a Constraint Satisfaction 
Problem, and |6I7I8I9I1U] use a Description Logic (DL) framework for UML Class 
Diagrams, ER, EER, and ORM. Of the many DL languages experimented with, 
VCR-ifd provides the best trade-off between expressiveness and available features 
for conceptual data modelling languages for UML Class Diagrams without OCL 
and the less expressive EER. However, one would also want to include at least 
some OCL or cater for a richer language such as Object-Role Modeling. In fact, 
whereas for some constraints in the UML framework one has to resort to OCL, 
with ORM there are still icons in the graphical language. In addition, ORM has 
explicitly external uniqueness, which is a more general version of UML's quali- 
fied association, and role subsetting, which corresponds to subsetting of UML's 
association ends, forcing one to add a mapping whereby we thus easily can add 
this also to the UML to TfCIZjfd mapping of [TT] so as to be more truthful to 
the UML specification p2j. In addition, ORM is a so-called "true" conceptual 
modelling language in the sense that it is independent of the application sce- 
nario and it has been mapped into both UML class diagrams and ER [T3] . That 
is, ORM and its successor ORM2 can be used in the conceptual analysis stage 
for database development, application software development, requirements en- 
gineering, business rules, and other areas |13|14)15|16|17j . Thus, if there is an 
ORM-to-DL mapping, the possible applications for automated reasoning ser- 
vices can be greatly expanded. Furthermore, given that EER and a restricted 
version of UML Class Diagrams also have VCTZifd-encodmg, this would greatly 
simplify model interoperability and mutual benefit of each language's strengths. 
Therefore, our aim is to map (most of) ORM and its successor, ORM2, into 
VCTZifd] the mappable constraints and correctness of encoding will be presented 
in section [3] This will also give a clear view on trade-offs between DL languages 
and the choice for using DL for automated reasoning over conceptual models. 
The reader may be aware of previous work by Jarrar [H] that attempted to com- 
plete the same task. The two main problems with that work is that several of 
his mapping "rules" did not remain within VCTZjfd but various constructors and 
features were borrowed from other DL languages (hence, the mapping in toto is 
not to any particular DL language) and a few ORM constraints were incorrect 
or incomplete. We shall discuss these issues, as well as some general reflections, 
in section [4] Finally, we close with conclusions in section [5j 

2 The T>CR. ifd language 

Description Logics (DL) languages are decidable fragments of first order logic 
and are used for logic-based knowledge representation, such as conceptual mod- 
elling and ontology development. The basic ingredients of all DL languages are 
concepts and roles, where a DL-role is an n-ary predicate (n > 2) and sev- 
eral constructs, thereby giving greater or lesser expressivity and efficiency of 



automated reasoning over a logical theory. DL knowledge bases are composed 
of the Terminological Box (TBox), which contains axioms at the concept-level, 
and the Assertional Box (ABox) that contains assertions about instances. A 
TBox corresponds to a formal conceptual data model or can be used to rep- 
resent a type-level ontology; refer to [H] for more information about DLs and 
their usages. For formal conceptual data modelling, we use VCR.^ [I9j . This 
DL language was developed to provide a formal characterization of conceptual 
modelling languages to enable automated reasoning over the conceptual models 
to improve their quality and that of the resulting software application, and to 
use it as unifying paradigm for database integration through integrating their 
respective conceptual models [20]. Take atomic relations (P) and atomic con- 
cepts A as the basic elements of VCTZjfd, which allows us to construct arbitrary 
relations (arity > 2) and arbitrary concepts according to the following syntax: 

R — ► T„| P | {%i/n : C) | -.R | Rill R 2 

C — ► T x | A | | C x nC 2 | 3[$i]R | < fc[$z]R 
where i denotes a component of a relation (the equivalent of an ORM-role); if 
components are not named, then integer numbers between 1 and n max are used, 
where n is the arity of the relation; k is a nonnegative integer for cardinality con- 
straints. Only relations of the same arity can be combined to form expressions 
of type Rill R2, and i < n, i.e., the concepts and relations must be well-typed. 
Further, VCTZjfd has identification assertions on a concept C, which has the 
form (id C[i{\Ri, [ih]Rh), where each Rj is a relation and each ij denotes one 
component of Rj. Then, if a is an instance of C that is the ij-th component of 
a tuple tj of Rj, for j G {1, h}, and b is an instance of C that is the ij-th 
component of a tuple Sj of Rj, for j 6 {1, ...,h}, and for each j, tj agrees with 
Sj in all components different from ij, then a and b are the same object |19j . 
This gives greater flexibility how to identify DL-concepts, most notably external 
uniqueness (/weak entity types/qualified associations |13j). and objectification. 
Second, VCTZjfd has non-unary Junctional dependency assertions on a relation 
R, which has the form (fd R ii, — > j), where h > 2, and i\, ...,ih,j denote 
components of R (unary fds lead to undecidability [TH]) and for all t,s £ R x , we 
have that t[r{\ = s[ri], = s[r^] implies tj = Sj. This is useful primarily for 
UML class diagram's methods and ORM's derived-and-stored fact types. The 
model-theoretic semantics of T>£1Z is specified through the usual notion of inter- 
pretation, where 1= (^ x , and the interpretation function • x assigns to each 
concept C a subset C 1 of A x and to each n-ary R a subset R x of (A x ) n , such 
that the conditions are satisfied following Table [T| Observe that id and fd are 
not mentioned in the semantics for VCR.;^ in Table [l] there are no changes in se- 
mantic rules because the algorithms for them are checked against a (generalized) 
ABox US]. 

A knowledge base is a finite set JCB of VCTZjfd axioms of the form C\ C C2 
and R\ C R 2 . An interpretation I satisfies C\ C C2 {R\ E R2) if and only if 
the interpretation of C\ (Ri) is included in the interpretation of C2 (R2), i.e. 
Cf C C x (R x C i?f). Ti denotes the interpretation domain, T n for n > 1 
denotes a subset of the n-cartesian product of the domain, which covers all 
introduced n-ary relations; hence on relations means difference rather than 



Table 1. Semantics of VCTZjf^. 

Tl c ( A 1 )" 1 A 1 c A x 

P 1 C Jl (-.C) 1 = A L \C I 

(^R) 1 = \ R z (Ci n C 2 f = Cf P Cf 

(RinR 2 f = Rf HRf (Si/n : C) 1 = {(d 1; d n ) G Tj|d, e C* 1 } 

Tf = (3[$i]R) x = {d G ^ z |3(di, d„) G R x .d, = d} 
(< fc[$i]R) r = {de A^IUdi, -,rfn) G R?ld t = d\} < k} 



the complement. The ($i/n : C) denotes all tuples in T„ that have an instance of 
C as their i-th component. The following abbreviations can be used: _L for ^T 1; 
Ci U C 2 for -.(-.Ci n -iC 2 ), Cx C 2 for -.Ci U C 2 , (> for -.(< k - l[i\R), 

3[i]R for (> l[i]R), V[i]R for -,B[i]->R, R 1 UR 2 for -.(-.iJ a n -^R 2 ), and (i : C) for 
(z/n : C) when n is clear from the context. Note that for a qualified role, as in 
3P.C and represented in VCR m as 3[$1](P n ($2/2 : C)), its inverse, 3P".C, is 
represented as 3[$2](Pn ($1/2 : C)), likewise for universal quantification (VP.C 
as -a[$l](Pn ($2/2 : nC)) and its inverse VP^.C as ^3[$2](Pn ($1/2 : -»C)) 
( [15] chapter 5). 



3 ORM2 to T>C7l M transformation 



After a brief ORM introduction and the technical preliminaries (section 3.2 1, 
the mappable elements of ORM2 into VCR-ifa are listed in section [3~3] which is 
followed by the proof of correctness of encoding. 



3.1 Brief informal overview of ORM 

The basic building blocks of the ORM language are object type (class), value 
type (data type), fact type (typed relation), role — that what the object or value 
type 'plays' in the relation — and a plethora of constraints. ORM supports n-ary 
relations, where n > 1, and this n-ary predicate, P, is composed of T\, ...,r n 
roles where each role has one object type, denoted with Ci,...,C n , associated 
with it. Roles and predicates are globally unique, even though the 'surface la- 
beling' by the modeller or domain expert may suggest otherwise. An example 
of a fact type is shown in Fig. [l] which was made with the NORMA CASE tool 
http://sourceforge.net/projects/orm/!. The diagrammatic representation has 

1) the name of the relation, which is displayed in the properties box of the 
relation and is, in this example, generated automatically by the software 
and called PatientlsAdmittedToHospitalAtDateDate; 

2) role names, such as the manually named [haspatients] for the the role that 
object type Hospital plays; and 

3) a label attached to the relation, "... is admitted to ... at date ..." , which is used 
for the verbalization by filling the ellipses with the names of the participating 
object types (Patient is admitted to Hospital at date Date); 

4) two object types, Patient and Hospital, and a value type Date; 



5) a reference scheme for each object type, shown in compact format with (ID) 
for Patient and (name) for Hospital; 

6) spanning uniqueness constraint drawn with a line next to the roles included 
in the uniqueness constraint, in this case all three roles are included; 

7) mandatory participation of Hospital in the fact type, denoted with a blob. 

Many more features are available in the language than are illustrated in this 
example. We deal with them in the next subsection and Fig. [2j 

ORM diagrams can be transformed more or less into, among others, ER 
and UML Class Diagrams, IDEFX1, SQL table definitions, C, Visual Basic, 
and XML. More information on these modelling, design- and implementation- 
oriented transformations can be found in, e.g., |13ll6j . 



Hospital 
(name) 



Patient 
(ID) 



[haspatients] 



A B 

Patient is an entity type. 
Reference Scheme: Patient has Patient_ID. 
Reference Mode: ID. 
Hospital is an entity type. 
Reference Scheme: Hospital has Hospital_name 
Reference Mode: name. 
Date is a value type. 

Portable data type: Temporal: Date &Time. 
Patient is admitted to Hospital at Date. 

It is possible that more than one Patient is admitted to the same Hospital at the same Date 
and that the same Patient is admitted to more than one Hospital at the same Date 
and that the same Patient is admitted to the same Hospital at more than one Date. 

Each Date, Hospital, Patient combination occurs at most once in the population of Patient is 
admitted to Hospital at Date. 

For each Hospital, some Patient is admitted to that Hospital at some Date. 
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Fig. 1. A: verbalization of the small ORM2 conceptual model, consisting of one 
fact type; B: graphical depiction of the ORM2 conceptual model, depicting two 
object types, a value type, a ternary relation, label for the reading, name of 
the first role in "[ ]", mandatory (blob) and uniqueness (line) constraints; C: 
properties box of the fact type, displaying the name of the relation 



3.2 Preliminaries 



The here presented transformation assesses all components and constraints of 
ORM2, hence also of ORM, except deontic constraints that were recently added 
to ORM2 (compared to ORM in [21], ORM2 also supports exclusive total cov- 
ering of subtypes, role values, and deontic constraints). As starting point, we 
used the ORM formalisation by Halpin 21], where available, which was the first 
formalisation of ORM. Other formalizations of ORM [22123] do not differ sig- 
nificantly from Halpin's version. They make clearer distinctions between roles 
and predicates and the relation between them and the naming versus labeling 
of ORM elements, but they cover fewer constraints. In the following, we take 
this same unambiguous approach of [22123] . That is, any ORM model has for 
each predicate a surface reading label, such as "...admitted to.. .at date..." in Fig. [T] 
a predicate name, which could be hospital Admission that is formally typed as 
\/x,y, z (hospital Admission(x,y, z) — > Patient(x) A Hospital(y) A Date(z)), and 
each of the roles can bear a name, or be simply indexed from left to right or top 
to bottom (n, r2, and r^) throughout the model (all roles and predicates are 
uniquely identified). An important consequence of such a commitment concerns 
the customary ORM practice of providing "forward" and "backward" reading 
labels so as to make nice pseudo-natural language sentences that can be verified 
by the domain expert; for instance, with a label orders / orderedBy, one then 
reads a fact type as Customer orders Book and Book orderedBy Customer. These, 
however, are reading labels and do not necessarily imply that orders(x,y) has 
orderedBy(y,x) as its inverse relation; in fact, that particular predicate may 
well be named ordering and the reading labels are just that. Ter Hofstede and 
Proper |23| refer to the distinction between predicate and role names versus 
reading labels as deep semantics versus surface semantics; for the ORM2 to 
VClZjfd transformation, we are interested in the former. 

3.3 Mappable elements and constraints 

The text in boldface in the following list indicates the name of the element or 
constraint, followed by the FOL characterisation taken from or based on [21], a 
"mapped to" , with the VCR-ifd representation and a reference to Fig. [2] for its 
graphical notation in the NORMA case tool. At times, we will use abbreviations 
in the FOL: to shorten representation of an n-ary relation, we use an underlined 
variable, as in x, which is an abbreviation for a sequence x\,...,x n in an n-ary 
relation, and sometimes we use this when we need access to one or two of the 
participating variables in the predicate, then the sequence comprises x\, ...,x n -2 
(the difference is clear from the context). 

1. Object type, \/xC(x) 
mapped to: C 

example: the solid roundtangles such as A 

2. Named value type (data type or lexical type), which permits values of 
some set {v±, ...,v n } where the values of C are not constrained to specific 
values, and the value type C, thus \/x(C(x) = x G {vi, ...,v n }) 




Fig. 2. 0RM2 model with most constraints in most of the allowable combina- 
tions. All object types should have a reference scheme like O and Q, where Q 
has the default notation to unclutter the diagram and O shows the expanded 
full representation. 



mapped to: C, the concrete domain (T) of the value type can be a user-defined 
or built-in one, such as String and Integer; we then have, like in p97, 
that a relation, R, from C to concrete domain of type T is represented as 
C C V[rl](iZ => (r2 : T)) 

example: the dashed roundtangles such as B, H, and I (the domain is shown 
only in the properties box). 

3. Unary relation, \/x(R(x) — ► Ci(x)) 

mapped to: R C (r» : (7j) l~l (fj : C") where C is a auxiliary new introduced 
filler object- or value type for the other position in the relation 
example: role p1 that is connected to C. 

4. Binary relation, Vx,y(R(x,y) — > Cj(ic) A Cj{y)) 
mapped to: i? C (r t : Cj) l~l (r^- : Cj) 

example: p2 relating object types C and A. 

5. rt-ary relation, Vxi, x n (R(xi, ...,x n ) — > Ci(xi) A ... AC n (x„)) where that 
Ci, C„ may be object types or value types 

mapped to: R C (n : Ci) l~l ... n (r„ : C„) or, in short, R C n" =1 (rj : Ci), 
hence a generalisation of the previous two 

example: any relation p1, p24 and the 'hidden' relations for the reference 
schemes (expanded for object type O). 



6. Object type participating in an n-ary relation Vxz C(x) — > R(x,z) 
mapped to: C C \/[ri]R 

example: all object types in the figure participate in at least one relation. 

7. Named value type, where the values of the concrete domain of the value 
type are constrained to specific values {v\, Vi}, and value type C with 

\/x(C(x) EE X E {Vl, Vi}) 

mapped to: Ci C \f[ri]R(R =>• (rj : Tj) n (Tj = {vi, Vi}) s.t. for each 
instance c of Ci, all values related to c by R are instances of Tj and have a 
value v± or... or Vi. The domain, T, of the value type can be a user defined 
one, such as String, Number, etc.; recollect that they are values, not objects 
(hence, not an enumerated class). 

example: B's values being restricted to one of the strings {'a', 'b', 'c'} and L 
to integers between 18 and 65. 

8. Mandatory, n-ary predicate with mandatory on role and i < n: 
\/xi{Ci{xi) -> 3xi, ...,Xi-i,x i+ i, ...,x n R(xi, ...,x n )) 

mapped to: Ci C 3[r^]i? 

example: the blob on the G participating in p6. 

9. Disjunctive mandatory between the zth roles of n different relations, 
where n > 2, for m-ary relations and i < m, then \/x(C(x) — > 3xi, ...,x m _i 
(Rl(xi,.,.,x il -i,x,Xi :L+ i, ...,x mi ) V ...WRn(xi, ...,x in -i,x,x in+ i, ...,x mn ))) 
mapped to: Ci C U" =1 3[rj]i?i among n relations, each for the j'th role, j < n 
(for two roles C, C 3[n]-RiU 3[rji2 2 ) 

example: the first roles of p16 and p17 to which N participates, i.e., each 
instance of N must participate either in p16 or in p17 (or both). 

10. Uniqueness, nil, [H]'s version for binary relation \/x,y, z(R(x,y) A 
i?(x, z) —fy = z), which can be generalised to n-ary relations 

mapped to: Ci C (< l[rji?) 

example: N's role playing in p17, indicated with the line above the role. 

11. Uniqueness, 1:1, binary relation, i.e. two times nrflO| 
mapped to: d C (< l[n]R) and C 3 C (< l[rj]R) 
example: two lines above the two roles in p22 and in p12. 

12. Uniqueness, m:n on a n-ary relation, n > 2, covering all n roles, is ignored 
[21] : "repetition of a proposition does not have a logical significance, and 
is ignored" [H](p4-5) (This is not necessarily true, and will be discussed 
elsewhere) , yet the case is included in nr 13 when i = n 

mapped to: (id ij[l]ri, [1]^), over i roles in n-ary relation, i = n, and R 
is a reified relation (see also nr 30 ) 

example: the line spanning two roles in the binary relation p2. 

13. Uniqueness, n-ary relation where n > 2, 1 < j < n, uniqueness constraint 
spans at least n-1 roles (for it to be elementary) , and j is excluded from the 
constraint Vxi , ...,Xj, 
...,x n ,y(R(xi, ...,Xj, ...,x„) A (R(xi, ...,y,x j+ i, ...,x n ) -> x 3 = y) 

mapped to: (id R [1]t*i, [l]^) over i roles in n-ary relation, 1 < i < n, and 
R is a reified relation (see also nr 30 1 ; note that the FOL formula applies to 
n-1 roles, whereas the "DCfR,-^ one assumes applied to the first i roles (i < n) 



example: the quaternary relation p1 1 has a uniqueness spanning 3 roles and 
the ternary p4 with a uniqueness over 2 roles. 

14. External uniqueness (i) among 2 roles Vxi, x 2 , y, z(Rl(x\, y) AR2{x\,z) 
A Rl(x 2 ,y) A R2(x 2 ,z) — ► Xi = x 2 ), (ii) among to roles Vxi, x 2 , j/i, y m ( 
i?l(xi,yi) A ... A Rm(xi,y m ) A i21(x 2 ,yi) A ... A Rm(x 2 ,y m ) ->• £i = x 2 ) 
mapped to: remodel as n-ary relation where n = to + 1 and a uniqueness over 
the n-1 and use nr. 13 (i.e., (id R [l]ri, [1]^) with R reified), or use the 
common object type to which the roles relate, i.e., (id C [ri]Rx, [ri]R m ), 
or, if no such type exist, use a placeholder C instead. 

example: F is identified by value types H and I, denoted with the encircled 
line connected to the respective roles in p7 and p8. 

15. Role frequency with (i) exactly a times, a > 1, then \/x(3y%R(x, yi) — > 
3z/2)-,y (yi JteA.-.Aj/o-i ^ y a AR(x,y 2 )A...AR(x,y a )))AVx,y 1 ,... ) y a+1 
(R(x,yi) A ... A R(x,y a +i) -> yi = 2/2 V yi = y 3 V ... Vj/ a = y a+i ) and (ii) at 
least a or (iii) at most a times 

mapped to: (i) Cj C (> a[r^]i?) n (< a[rj]i?) where a > 1 and (ii) Cj C (> 
a[n]i?) and (iii) C, C (< a[n]R) 

example: the > 2 connected to the role that value type l_ plays in relation 
p10. 

16. Role frequency with at least a and at most b, 1 < a, and a < b, thus 
\/x(3y 1 R(x,y 1 ) -> 3y 2 , ...,y a (yi / ftA ... A y a _i ^ y a A R(x,y 2 ) A ... A 
i?(x, y a ))) A Vx, yi, ...,y 6+ i(i?(x,yi) A ... A R(x,y b+1 ) -> yi = y 2 V j/i = 
y,3 V ... V yt = 2/6+1) and for an n-ary i? where n > 2 and the amount of 2 
is n - 2 roles, then Vx, z{3yxR{x, y lt z) -> 3y 2 , ...,y a (yi ^ y 2 A ... A y a _i 7^ 
y a AR(x,y2,z)A...AR(x,y a ,z)))AVx,yi, ...,y b+ i(R(x,yi, z)A...AR(x,y b+ i, 
z) -> yi = y 2 V yi = y 3 V ... V y b = y 6+ i) is 

mapped to: C{ C (> a[rJJi) n (< 6[rj]i?) where 1 < a < 6 and i < n 
example: alike the > 2 of nr. 15, but then denoted with, e.g. 2. .5. 

17. Proper subtype, which holds for subsumption of either object types or 
of value types, but which cannot be mixed (and note that at times their 
extensions may contain the same elements) Vx(D(x) — > C(x)) 

mapped to: D IZ C; one could add C Z) to ensure that the concepts D and 
C are never equivalent, but all DL-based reasoners check for subsumption 
and do classification to detect such distinctions already anyway 
example: G is a subtype of F and AB of I, denoted with the arrow. 

18. Subtypes, total (exhaustive) covering (not formalised in [ST]) 
mapped to: C C D\ U ... U D n , where the indexed concepts D are subtypes 
of C, in short: C C U™ =1 A 

example: encircled blob between the two subtype arrows of V and W toward 
their common supertype U, likewise for value types AB, AC and I. 

19. Exclusive (disjoint) subtypes (not formalised in [2Tj ) 

mapped to: defined among the 1, n subtypes of C, then Z)j C n™ =i+1 ^Dj 
and Di C C for each i £ {1, n} 

example: encircled X between the arrows of T and U that are subtypes of S. 



20. Exclusive subtypes, total (not formalised in [21] ) 
mapped to: use nr |18| fc nr |19| 

example: encircled X with blob on the arrows that subtype V into X and Y. 

21. Subset over two roles r.i and rj in two n-ary relations Rj and Ri then 
for binary \/x(3yRj(x, y) — ► 3zi?j(x, z)) and for an n-ary R where n > 2 and 
the amount of w is n — 2 roles, then Vx, w(3yRj(x, y, w_) — > 3zRi(x, z, w_)) 
mapped to: [ri]Rj E 

example: the two roles to which Z participate in p21 and p24, the latter being 
a subset of the former. 

22. Subset over two n-ary relations, for binary \/x,y(Rj(x,y) — > Ri(x,y)) 
and for n-ary relation, Vx, y(3 z (Rj(z) Ax — Zj A y — Zj+i) ->3tc (Ri(w) A 
x = Wi A y — Wi+i)) from |21j . our compact version V x(i?i(x) — > R 2 (x)) 
mapped to: Rj C i?j 

example: p6 is a subset of p5; note that the lines connecting the icon to the 
relation is to the role-divider line instead of in the middle of the role. 

23. Set-equality over two roles rj in two binary relations Rj, Ri with 
Vx(3yRj(x, y) = 3zRi(x, z)) and for an n-ary R where n > 2 and the amount 
of w is n — 2 roles, then \fx, w{3yRj{x 1 y, w) = 3zRi(x, z, w_)) 

mapped to: [ri]Rj = [ri\Ri 

example: an encircled equality sign (as between p20 and p21), not drawn. 

24. Set-equality over two n-ary relations for binary Vx, y(Rj(x, y) = Ri(x, y)) 
for n-ary relationVx, y{3 z_ (Rj(z) Ax = Zj A y = Zj+i) = 3 w (Ri(w) Ax — 
Wi Ay = Wi+i)) from |21j . and our compact version Vx(i?l(x) = R2(x)) 
mapped to: Rj = Ri 

example: encircled equality sign between the two relations p20 and p21. 

25. Role exclusion between two roles r^ and rj each in n-ary relations Ri, Rj 
(which do not necessarily have the same arity), in abbreviated form where 
x € A =def A(x), Ri-Ti (resp. Rj.rj) the r$ (rj) role in relation Ri (Rj), 
1 < i < n, then Vx->(x € -Rj.rj A x € JL.rj); between n roles r 1: r„ each 
one in an m-ary relation R±, R n (which do not necessarily have the same 
arity) Vx^((x <E R\.rt Ax £ R 2 .r 2 ) V (x € iii.ri Axe i? 3 .r 3 ) V ... V (x e 
R n -i-r n -i A x G R n -r n )) 

mapped to: [rjj-R, C " n [ r j]-Rj f° r binary and ([ri]i?i C ^[r 2 ]i?2) U ([ri]i?i C 
->[r 3 ]i? 3 ) U ... U (\r n _x\R n _i C -i[r n ]i?„) for n > 2 (and the relations have 
the same arity) 

example: the encircled cross with lines to each of the three roles to which M 
participates in relations p12, p13 and p14. 

26. Relation exclusion between two relations i?j and Rj, \/x,y^(3z_(Ri(z_ 
Ax = Zi A y = Zj+i) A3 f (Rj(w) A x = wj A y = wj+i)) from [2Tj . our 
version for n-ary relations: Vx(i?i(x) — > ^R 2 (x) 

mapped to: Ri C -iRj] note this is relational difference, not negation 
example: diagrammatic representation as in previous constraint, but then 
the connecting lines go to the role-divider lines instead of the middle of the 
roles. 



27. Join-subset among four, not necessarily distinct, relations Ri, Rj, Rk, Ri, 
where Ri * Rj[ci, Cj] is the projection on columns Ci, Cj of the natural join of 
Ri, Rj. Then Ri * Rj[ci, Cj] C R k * Ri[ck, c/] where the compared pairs must 
belong to the same type, like e.g. ri of Ri and r k of Rk is played by C a and 
rj of Rj and r; of Ri is played by C& (See also the example for 3 relations in 
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mapped to: extend nr 21 for subsets of two roles, this ([rj]iJj n \rj\Rj) C 
{[ r k]Rk n [ri]Ri) reduces to query containment (see |18l6j ) 
example: the simpler case for three relations is drawn between p18 and the 
relevant roles R and S play in p17 and p16, respectively. 
28. Join-equality, see nr 27 for notation, then (i) with four distinct relations 
Ri * Rj[ci, Cj] = Rk * Ri[ck, Ci] and (ii) three distinct binary relations Ri, Rj, 
R k such that Wx, y(3z(R j (z,x) A R k (z, y)) = Ri(x, y)) 



mapped to: (i) Extending nr 21 for subsets of two roles, this {[ri]Ri\l[r j]Rj) = 
{[fk]Rk n [ri]Ri) as query containment in both directions, see nr 27 and (ii) 
as simpler version of (i) as ([rj]Rj n [rk]Rk) = ([ri]Ri n [rj\Ri) 
example: as in nr |27| but then with an encircled equality instead of the 
encircled subset. 



29. Join-exclusion, see nr 27 for notation, then Ri* Rj[ci,Cj] C -^R k * Ri[ck,c{\ 



(See also the example for 3 relations in nr 28 



mapped to: extending nr 21 then ([r,i]i?i n [rj]Rj) C ~^([rk\Rk l~l [ri]Ri) see 
also nri27l 

example: not drawn, follows the same pattern as previous two. 

30. Objectification (nesting, reification), full uniqueness constraint over the 
n roles of the n-ary relation (note the relaxation described in [53]), R Q is 
the objectified relation of R, \fx(R (x) = 3x\, ...,x n {R{x\, ...,x n ) A x = 
(xi, ....,x n ))) 

mapped to: R C 3[l]n n (< l[l]n) n V[l](ri ^ (2 : Ci))n 3[l]r 2 n (< 
l[l]r a ) nV[l](r a => (2 : C 2 ))n ... 3[l]r„ n (< l[l]r n )nV[l](r„ (2 : C n )) 
where the 3[l]ri (with i G, {1, n}) specifies that concept i? must have all 
components r\,...,r n of the relation R, (< l[l]fi) (with i S {1, ...,n}) speci- 
fies that each such component is single- valued, and V[l](ri ^> (2 : Cj)) (with 
i e {1, ...,n}) specifies the class each component has to belong to 
example: p9 is objectified into FP9J. 

31. Derived fact type, implied by the constraints of the roles from which the 
fact is derived, i.e. the original and derived fact type relate through <-> 
mapped to: implied by the constraints of the roles from which the fact is 
derived, hence N/A 

example: not drawn; the name of the relation is appended with one asterisk 
instead of the two for derived-and-stored. 

32. Derived-and-stored fact type, or conditional derivation, where the pred- 
icate indicates that the derivation rule provides only a partial definition of 
the predicate, i.e. the original and derived fact type relate through — > 
mapped to: use DClZjfd's fd. With m parameters belonging to the classes 
P\,...P m (the known part of the partial definition of the predicate) and 



the result belongs to R (the computed 'unknown' part of the partial def- 
inition of the predicate), then we have the relation fp u „ p m with arity 
1 + m+l, then fp lt ... tPm E (2 : Pi) l~l ... I~l (to + 1 : P m ) with fd as (fd 
fp u ...,p m l,...,m+l 4 m+2) and the class C E V[l](/ Pl ,..., Pm (m+2 : R)). 
Note that for a derivation rule, m > 1 

example: p23", i.e., the values for AA arc calculated by some formula. 
33. Role value constraint (new in ORM2) type Cj only participates in role 
fj if an instance has any of the values {v iy ...«&}, which is a subset of the 
set of values Ci can have, for a binary relation, then Vx, y(x € {i>i, w/c} — > 
(i?(x,y)^C i (x)AC i (2/)))holds 

mapped to: split the constraint by creating a new subtype for the set 
of values to which the role is constrained, where the value can be any of 
{vi, ...Ufe}, and let C[ play role r,, s.t. C[ E Q and C- E V[rj]i? and then use 
named value types for the value constraints on C[ 

example: only those dates with values {01 -01 -05. .01 -01 -08} can participate 
when Z participates in p20, although there may be other particular dates 
recorded for Z that participate in p19; clearly, this role value constraint also 
holds for p21 due to the equality and for p24 due to the subsetting. 
Arguably, ORM's reference scheme could have been included in the list of con- 
straints. It is depicted in expanded mode for object type O, with a mandatory 
and 1:1 participation, i.e. that the value type 0_name has unique values by which 
O is identified, which is also O's preferred reference scheme, indicated with a dou- 
ble line above 0_name's role (more reference schemes are possible, but one has to 
choose a preferred one). Although the latter uses a different graphical clement, 
it does not change the logical representation and therefore has not been included 
in the transformation, above. 

We demonstrate UClZifd's representation of ORM's fundamental notion of 
fact type in the following example. 

Example 1 The typed ternary relation of Fig. [7] — an ORM fact type — is repre- 
sented in DClZifd as follows: 

PatientlsAdmittedToHospitalAtDateDate E 

(r1 : Patient) n (haspatients: Hospital) n (r3: Date) 
where the name of the predicate is the one automatically generated by the NORMA 
software, PatientAdmittedToHospitalAtDateDate, and the ORM-roles are indexed 
from left to right, except for the second one, which has as name haspatients. 
The mandatory constraint corresponds to 

Hospital C 3[haspatients]PatientlsAdmittedToHospitalAtDateDate. 
One can also reify this relation: 

PatientlsAdmittedToHospitalAtDateDate C 

3[l]n n (< l[rl]ri) n V[rl](ri ^ (r2 :Patient))n 
3[l]r 2 n (< l[rl]r 2 ) n V[rl](r 2 => (r2 : Hospital))n 
3[l]r 3 n (< l[rl]r 3 ) n V[rl](r 3 => (r2 : Date)) 

The correctness of encoding of this fragment of ORM2, let us call it ORM2~, 
can be proven by the same line of argumentation as Theorem 6.6 in 



Theorem 1 Let T> be an ORMtT diagram and K-d the VCTZifd knowledge base 
constructed as described above. Then every instantiation of "D is a model ofJCri, 
and vice-versa. 

Proof. Both the FOL formalization of the ORM2~ diagram D and the DClZifd 
knowledge base Kd are over the same alphabet, so their interpretations are com- 
patible. Considering each ORM2~ construct separately as described in items 
1-33, above, an interpretation satisfies its FOL formalization if and only if it 
satisfies the corresponding VCTZifd assertions. □ 

Like with the results obtained by [TT] for UML Class Diagrams, a consequence of 
Theorem[l]is that reasoning on ORM2~ diagrams can be performed by reasoning 
on VCR-ifd knowledge bases, and, consequently, we obtain Theorems [2] and [3] 
(analogous to Theorem 6.7 resp 6.8 in [TT]). 

Theorem 2 Let T> be an ORM2T diagram and ICd the VCTZifd knowledge base 
constructed as described above. Then an object type C is consistent in T> if and 
only if the concept C is satisfiable w.r.t. Kd- 

Theorem 3 Object type consistency in ORM2T diagrams is Exp Time- complete. 

While they arc encouraging results for reasoning over ORM2 - diagrams, it is 
also useful to look at why we have only ORM2 - but not the full ORM2 and what 
the prospects are for any future extension. This has been discussed extensively 
in [10] and concern the impossibility to represent ORM's ring constraints, i.e., 
DL role properties, in UClZifd an d certain arbitrary projections that make the 
language undecidable (such as multi-role frequency, depicted in Fig. [2] on p11). 

4 Discussion and related work 

To assess the merits of the ORM to VCTZ/fd mapping (or any other DL, for 
that matter), we first have to address the previous attempt by Jarrar [8] and 
subsequently we will cast a wider scope. 

Related work. We deal first with the claimed ORM to "DClZjfd "rules" with 
respect to the logic, then with respect to ORM. Jarrar introduces STRING and 
NU MBER as concepts with the intention to stand in for data types for restrict- 
ing the values (pl90). However, they being primitive concepts, the "{xi, . . . , x n }" 
denote objects, not values, i.e., using DL's one-of constructor, which is neither in 
DCTZjftfS syntax nor the intention of value restrictions, thereby invalidating rules 
12 and 13; compare this with nr. 2 and nr7, above. For role and relation subset 
constraints, Jarrar chose to avail of a different language, DLR-Lite, to deal with 
projections; it is, however ORM's freedom of allowing arbitrary projects that 
makes the language undecidable (together with uniqueness constraints, one may 
regain decidability), hence rule 16 (pl91) as such cannot be applied. Further, all 
mappings and suggestions for ring constraints are incorrect (ppl92-193). VCTZifd 



is a rather poor language for relational properties and one may be better off with 
VCTZ^ or S1ZOTQ if ring constraints are one's only interest. More precisely, 
VCR-ifd uses a rewriting for inverses as shown at the end of section [2] which is 
different from the semantic rule (P~) x = {(a, b) e A 1 x A 1 \ (b,a) 6 P 1 } and 
therefore the role inclusion with inverse that is needed for symmetry and asym- 
metry do not have a 2?£72.,y£/-equivalent (rules 23 and 24). For antisymmetry and 
irreficxivity, BR.Self is used, which is not in DCTZifd but taken from S1ZOTQ 
(note also that antisymmetry in ORM is the normal version, not "irreflexive an- 
tisymmetry" (i.e., asymmetry) |25ll0j ). Also the the role composition operator 
is not in VCTZjfd, thereby invalidating rule 26 for the intransitive ring constraint. 
Last, if one insists on having acyclicity in the language, then T>CTZ^ is an option, 
because there one can represent it using the least/geatest fixpoint operator [26] . 

There are three further issues from the ORM perspective as well as DL us- 
age. First, Jarrar introduces a "JZ" to capture the notion of proper subtyping 
whilst admitting it is not part of the syntax and claiming that "it can be im- 
plemented by reasoning on the ABox to make sure that the population of A 
and the population of B are not equal" (pl89). However, subtyping is about the 
intension of the concept, not about the extension (population) at some time 
(see also [T3] p247). That is, when we have, say, B C A then a database state 
where population(B) = population(A) is admissible, be they both empty sets 
or coincidentally have the same instances; this also means we cannot delegate 
the reasoning to the ABox as Jarrar proposes. In fact, checking for subtyping 
is addressed by DL reasoners already. Second, the exclusion constraint in rule 
11 (pl90) assumes disjointness among two arbitrary classes, but this is never 
the case — neither in ORM and ORM2 nor in UML or EER — i.e., disjointness is 
among subtyped classes of supertype C (see nr. 19). Third, unaries (1-role fact 
types) cannot be represented in VClZjfd other than by making a binary rela- 
tion of it with an auxiliary object- or value type. Jarrar tries to solve this by 
introducing a not further specified concept BOOLEAN where the values ought 
to be restricted to 'true' or 'false'; hence, alike mentioned above, making 'true' 
and 'false' objects instead of values. If introduced, it is a data type (also called 
'concrete domain'), but to accommodate ORM's flexibility and suggestion of the 
alternative notation for unaries ( |13j p83), any new arbitrary C will do, be it 
an object or value type. For instance, for an unary Walks linked to Person (i.e., 
Vx(Walks(x) — > Person(x))), we could introduce a value type Walking that has 
domain String and values restricted to, say, 'y es \ 'no', and 'walkingaid'. 

Concerning elegance in the mappings, there are three points. First, normally 
the relations contributing to identification in an external uniqueness constraint 
are directly related to an object type for which the identification is intended, 
which is also the case in Jarrar's example when one uses the normal compact 
representation for reference modes. In theory, ORM does not seem to exclude 
modelling exotic external uniqueness constraints where roles of different relations 
are combined creatively; however, it is not specifically included in any ORM for- 
malization |21|23j and ORM modelling software does restrict its usage (NORMA 
prohibits setting the constraint as preferred identifier in a path-based external 



uniqueness constraint) or does not mention it (FCO-IM). To accommodate it 
nevertheless, one does not need a concept "Top" to stand in as a natural lan- 
guage version for T (as in rule 7, pl88), because introducing a new placeholder 
concept C suffices that, if desired, may well be a subtype of another object type 
(see nr. 14). Second, uniqueness on n-ary relations is not ideal with fd (rule 6) 
because of the exceptions, and can be done more consistently with id; in con- 
trast, fd's are useful in particular for derived and stored relations (UML methods 
|llj). Third, the use of _L in Jarrar's rule 8 for the role frequency constraints of 
the 'at least a and at most V is overly cumbersome, given that a straightforward 
conjunction (n) suffices (see nr. 16, above). 

General discussion. To put these issues in a broader framework, we observe 
the following. ORM and ORM2 do have a formal foundation for about 20 years, 
but when one looks at the details, there is no 'standard ORM' like the UML 
specification [15]. From a logician's perspective, it then seems fair to choose a 
convenient fragment that fits with one's favourite DL language and define an 
'ORM*'— or a UML* or EER* for that matter— so that each ORM* diagram 
(in casu, ORM - ) has an equi-satisfiablc DL (VCTZifd) knowledge base and use 
the reasoning procedures and complexity results of the chosen language. (This 
has been done also for UML [IT], but then with the argumentation that UML 
being officially informal, one can choose one's own reading of the icons.) How- 
ever, it is not the case that any DL language will do just fine. With VCTZ^ 
or S7ZOTQ (the basis for OWL 2), we gain regarding the role properties, but 
loose n-ary relations where n > 2, id, and fd, hence, also correct objectifica- 
tion, multi-attribute keys, external uniqueness (also called weak entity types or 
qualified associations), and derived-and-stored fact types (UML methods). Note 
that the former language is still ExpTime-Completc, but the latter already 2- 
NExpTime. Clearly, if one takes the assumption that more features are better, 
then a formalisation would look different, where one also would be able to add 
all temporal constraints, deontic constraints, and what have you. This, however, 
makes practical automated reasoning over conceptual data models unrealistic 
(more precisely: undecidable) . The performance-oriented modeler may want to 
go to even lower complexity, such as with the DL — Lite family of DLs, and 
sacrifice even more features compared to the presented mapping so as to stay 
within NP or NLogSpace complexity (e.g. |7|2j ). Which features are more im- 
portant and if one should have many features or less for understandability is a 
long-standing debate, which we do not want to go into here. There are many 
options to choose from regarding DLs — the combination of features, complex- 
ity, extant implementations — which, perhaps, for a conceptual modeller may be 
off-putting compared to, say, straightforwardly writing an encoding of a con- 
ceptual model as a Constraint Satisfaction Problem (CSP). But note that it is 
exactly the formal characterisation of a conceptual data modelling language that 
helps these efforts: if each tool with a constraint-based approach would decide on 
its own formalizations and CSP encoding, then we would face the situation that 
each reasoner might come to other derivations, which would not build confidence 



among the user base of conceptual modellers. Developers of DL-based reasoners, 
on the other hand, already have a well-established coordination and the reason- 
ers do adhere to the DIG standard http:/ / dl.kr.org/dig/ that provides uniform 
access to the DL reasoners such as Fact++, Racer and QuOnto. 

Overall, using a DL, and with respect to conceptual data modelling languages 
VCTZjfd in particular, has several advantages, such as use of a well-studied fam- 
ily of decidablc formal languages with model-theoretic semantics, insight in the 
computational properties, and availability and active ongoing development of au- 
tomated reasoners. Another interesting benefit of having carried out the mapping 
with ORM and VCTZjfd, is that we now easily can compare it with the mappings 
from UML and EER to VCTZjfd an d offer an option to conduct a comparison 
and unification that could be added to the VCTZ-ba,sed Racer-enhanced proof- 
of-concept ICOM conceptual modelling tool [www.inf.unibz.it/~franconi/icom]. 
It also served to demonstrate that the complex UML qualified associations and 
association end subsetting — not covered by Berardi et al's mapping — can 
easily be represented in VCTZjfd as shown in nr. 14 and nr. 21, respectively. In 
addition, it can facilitate the investigations into ontological foundations of con- 
ceptual data modelling and its languages because there is at least a common, 
precise, vocabulary and semantics. 

5 Conclusions 

We have transformed most 0RM/0RM2 features into an equivalent VCTZifd rep- 
resentation, and VCTZjfd only; thus, any ORM~ diagram has an equi-satisfiable 
VCTZjfd knowledge base. This ExpTime-Complete ORM - suffices for many con- 
ceptual data models in practice, i.e, VCTZjfd suffices so that there is the benefit 
of interoperability among EER, UML Class Diagrams and ORM through a com- 
mon formal language. In addition, by focussing on a very expressive language, 
we extracted two additional mappings from UML Class Diagrams to VCTZjfd, 
being qualified association and subsetting of association ends, and illustrated 
trade-offs for choosing an adequate DL language. 

Current investigation focuses on exploring options to add useful temporal 
operators and on adding id and fd to VCTZ^. 
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